Punish/Reward: Learning with a Critic in Adaptive Threshold Systems

نویسندگان

Bernard Widrow

Narendra K. Gupta

Sidhartha Maitra

چکیده

An adaptive threshold element is able to "learn" a strategy of play for the game blackjack (twenty-one) with a performance close to that of the Thorp optimal strategy although the adaptive system has no prior knowledge of the game and of the objective of play. After each winning game the decisions of the adaptive system are "rewarded." After each losing game the decisions are "punished." Reward is accomplished by adapting while accepting the actual decision as the desired response. Punishment is accomplished by adapting while taking the desired response to be the opposite of that of the actual decision. This learning scheme is unlike "learning with a teacher" and unlike "unsupervised learning." It involves "bootstrap adaptation" or "learning with a critic." The critic rewards decisions which are members of successful chains of decisions and punishes other decisions. A general analytical model for learning with a critic is formulated and analyzed. The model represents bootstrap learning per se. Although the hypotheses on which the model is based do not perfectly fit blackjack learning, it is applied heuristically to predict adaptation rates with good experimental success. New applications are being explored for bootstrap learning in adaptive controls and multilayered adaptive systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Learning in an Actor-Critic Architecture with Reward and Punishment

A reinforcement architecture is introduced that consists of three complementary learning systems with different generalization abilities. The ACTOR learns state-action associations, the CRITIC learns a goal-gradient, and the PUNISH system learns what actions to avoid. The architecture is compared to the standard actor-crititc and Q-learning models on a number of maze learning tasks. The novel a...

متن کامل

Adaptive critic for sigma-pi networks

-This article presents an investigation which studied how training o f sigma-pi networks with the associative reward-penalty ( A R-p ) regime may be enhanced by using two networks in parallel. The technique uses what has been termed an unsupervised "'adaptive critic element" (ACE) to give critical advice to the supervised sigma-pi network. We utilise the conventions that the sigma-pi neuron mod...

متن کامل

Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning

This paper studies the Sensor Management (SM) problem for optical Space Object (SO) tracking. The tasking problem is formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). The RL problem is solved using the actor-critic policy gradient approach. The actor provides a policy which is random over actions and given by a parametric probability density function (p...

متن کامل

An ART-based fuzzy adaptive learning control network

This paper proposes a reinforcement fuzzy adaptive learning control network (RFALCON), constructed by integrating two fuzzy adaptive learning control networks (FALCON), each of which has a feedforward multilayer network and is developed for the realization of a fuzzy controller. One FALCON performs as a critic network (fuzzy predictor), the other as an action network (fuzzy controller). Using t...

متن کامل

Semi-Markov Adaptive Critic Heuristics with Application to Airline Revenue Management

The adaptive critic heuristic has been a popular algorithm in reinforcement learning (RL) and approximate dynamic programming (ADP) alike. It is one of the first RL and ADP algorithms. RL and ADP algorithms are particularly useful for solving Markov decision processes (MDPs) that suffer from the curses of dimensionality and modeling. Many real-world problems however tend to be semi-Markov decis...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Trans. Systems, Man, and Cybernetics

دوره 3 شماره

صفحات -

تاریخ انتشار 1973

Punish/Reward: Learning with a Critic in Adaptive Threshold Systems

نویسندگان

چکیده

منابع مشابه

Fast Learning in an Actor-Critic Architecture with Reward and Punishment

Adaptive critic for sigma-pi networks

Dynamic Sensor Tasking for Space Situational Awareness via Reinforcement Learning

An ART-based fuzzy adaptive learning control network

Semi-Markov Adaptive Critic Heuristics with Application to Airline Revenue Management

عنوان ژورنال:

اشتراک گذاری